Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 3432, 2024 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-38341450

RESUMO

Many nocturnally active fireflies use precisely timed bioluminescent patterns to identify mates, making them especially vulnerable to light pollution. As urbanization continues to brighten the night sky, firefly populations are under constant stress, and close to half of the species are now threatened. Ensuring the survival of firefly biodiversity depends on a large-scale conservation effort to monitor and protect thousands of populations. While species can be identified by their flash patterns, current methods require expert measurement and manual classification and are infeasible given the number and geographic distribution of fireflies. Here we present the application of a recurrent neural network (RNN) for accurate automated firefly flash pattern classification. Using recordings from commodity cameras, we can extract flash trajectories of individuals within a swarm and classify their species with an accuracy of approximately seventy percent. In addition to its potential in population monitoring, automated classification provides the means to study firefly behavior at the population level. We employ the classifier to measure and characterize the variability within and between swarms, unlocking a new dimension of their behavior. Our method is open source, and deployment in community science applications could revolutionize our ability to monitor and understand firefly populations.


Assuntos
Vaga-Lumes , Comportamento Sexual Animal , Humanos , Animais
3.
Nat Methods ; 19(4): 445-448, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35396485

RESUMO

Structural variants are associated with cancers and developmental disorders, but challenges with estimating population frequency remain a barrier to prioritizing mutations over inherited variants. In particular, variability in variant calling heuristics and filtering limits the use of current structural variant catalogs. We present STIX, a method that, instead of relying on variant calls, indexes and searches the raw alignments from thousands of samples to enable more comprehensive allele frequency estimation.


Assuntos
Genoma , Variação Estrutural do Genoma , Neoplasias , Algoritmos , Variação Estrutural do Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Software
4.
Am J Hum Genet ; 109(4): 680-691, 2022 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-35298919

RESUMO

Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.


Assuntos
Variação Genética , Projetos de Pesquisa , Simulação por Computador , Variação Genética/genética , Haplótipos/genética , Humanos , Modelos Genéticos , Herança Multifatorial
5.
Genome Biol ; 22(1): 161, 2021 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-34034781

RESUMO

Visual validation is an important step to minimize false-positive predictions from structural variant (SV) detection. We present Samplot, a tool for creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across samples and sequencing technologies. These images can be rapidly reviewed to curate large SV call sets. Samplot is applicable to many biological problems such as SV prioritization in disease studies, analysis of inherited variation, or de novo SV review. Samplot includes a machine learning package that dramatically decreases the number of false positives without human review. Samplot is available at https://github.com/ryanlayer/samplot .


Assuntos
Variação Estrutural do Genoma , Software , Automação , Inversão Cromossômica , Duplicação Gênica , Reprodutibilidade dos Testes , Translocação Genética
6.
Front Genet ; 12: 639355, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33732289

RESUMO

Genomic structural variants (SVs) are a major source of genetic and phenotypic variation but have not been investigated systematically in rainbow trout (Oncorhynchus mykiss), an important aquaculture species of cold freshwater. The objectives of this study were 1) to identify and validate high-confidence SVs in rainbow trout using whole-genome re-sequencing; and 2) to examine the contribution of transposable elements (TEs) to SVs in rainbow trout. A total of 96 rainbow trout, including 11 homozygous lines and 85 outbred fish from three breeding populations, were whole-genome sequenced with an average genome coverage of 17.2×. Putative SVs were identified using the program Smoove which integrates LUMPY and other associated tools into one package. After rigorous filtering, 13,863 high-confidence SVs were identified. Pacific Biosciences long-reads of Arlee, one of the homozygous lines used for SV detection, validated 98% (3,948 of 4,030) of the high-confidence SVs identified in the Arlee homozygous line. Based on principal component analysis, the 85 outbred fish clustered into three groups consistent with their populations of origin, further indicating that the high-confidence SVs identified in this study are robust. The repetitive DNA content of the high-confidence SV sequences was 86.5%, which is much higher than the 57.1% repetitive DNA content of the reference genome, and is also higher than the repetitive DNA content of Atlantic salmon SVs reported previously. TEs thus contribute substantially to SVs in rainbow trout as TEs make up the majority of repetitive sequences. Hundreds of the high-confidence SVs were annotated as exon-loss or gene-fusion variants, and may have phenotypic effects. The high-confidence SVs reported in this study provide a foundation for further rainbow trout SV studies.

7.
Nat Commun ; 11(1): 5176, 2020 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-33056985

RESUMO

Structural variants (SVs) are a major source of genetic and phenotypic variation, but remain challenging to accurately type and are hence poorly characterized in most species. We present an approach for reliable SV discovery in non-model species using whole genome sequencing and report 15,483 high-confidence SVs in 492 Atlantic salmon (Salmo salar L.) sampled from a broad phylogeographic distribution. These SVs recover population genetic structure with high resolution, include an active DNA transposon, widely affect functional features, and overlap more duplicated genes retained from an ancestral salmonid autotetraploidization event than expected. Changes in SV allele frequency between wild and farmed fish indicate polygenic selection on behavioural traits during domestication, targeting brain-expressed synaptic networks linked to neurological disorders in humans. This study offers novel insights into the role of SVs in genome evolution and the genetic architecture of domestication traits, along with resources supporting reliable SV discovery in non-model species.


Assuntos
Animais Selvagens/genética , Domesticação , Genoma , Variação Estrutural do Genoma , Salmo salar/genética , Animais , Elementos de DNA Transponíveis/genética , Pesqueiros , Duplicação Gênica , Frequência do Gene , Variação Genética , Genética Populacional , Técnicas de Genotipagem , Masculino , Anotação de Sequência Molecular , Filogeografia , Sequenciamento Completo do Genoma , Fluxo de Trabalho
8.
Nucleic Acids Res ; 48(12): 6597-6610, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32479598

RESUMO

The human genome encodes an order of magnitude more gene expression enhancers than promoters, suggesting that most genes are regulated by the combined action of multiple enhancers. We have previously shown that neighboring estrogen-responsive enhancers exhibit complex synergistic contributions to the production of an estrogenic transcriptional response. Here we sought to determine the molecular underpinnings of this enhancer cooperativity. We generated genetic deletions of four estrogen receptor α (ER) bound enhancers that regulate two genes and found that enhancers containing full estrogen response element (ERE) motifs control ER binding at neighboring sites, while enhancers with pre-existing histone acetylation/accessibility confer a permissible chromatin environment to the neighboring enhancers. Genome engineering revealed that two enhancers with half EREs could not compensate for the lack of a full ERE site within the cluster. In contrast, two enhancers with full EREs produced a transcriptional response greater than the wild-type locus. By swapping genomic sequences, we found that the genomic location of a full ERE strongly influences enhancer activity. Our results lead to a model in which a full ERE is required for ER recruitment, but the presence of a pre-existing permissible chromatin environment can also be needed for estrogen-driven gene regulation to occur.


Assuntos
Elementos Facilitadores Genéticos/genética , Receptor alfa de Estrogênio/genética , Motivos de Nucleotídeos/genética , Transcrição Gênica , Acetilação , Cromatina/genética , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Humanos , Regiões Promotoras Genéticas/genética
9.
Nature ; 583(7814): 83-89, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32460305

RESUMO

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.


Assuntos
Variação Genética , Genoma Humano/genética , Sequenciamento Completo do Genoma , Alelos , Estudos de Casos e Controles , Epigênese Genética , Feminino , Dosagem de Genes/genética , Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Anotação de Sequência Molecular , Locos de Características Quantitativas , Grupos Raciais/genética , Software
10.
Front Genet ; 11: 152, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32194629

RESUMO

SUMMARY: Genotype Query Tools (GQT) were developed to discover disease-causing variations from billions of genotypes and millions of genomes, processes data at substantially higher speed over other existing methods. While GQT has been available to a wide audience as command-line software, the difficulty of constructing queries among non-IT or non-bioinformatics researchers has limited its applicability. To overcome this limitation, we developed webGQT, an easy-to-use tool with a graphical user interface. With pre-built queries across three modules, webGQT allows for pedigree analysis, case-control studies, and population frequency studies. As a package, webGQT allows researchers with less or no applied bioinformatics/IT experience to mine potential disease-causing variants from billions. RESULTS: webGQT offers a flexible and easy-to-use interface for model-based candidate variant filtering for Mendelian diseases from thousands to millions of genomes at a reduced computation time. Additionally, webGQT provides adjustable parameters to reduce false positives and rescue missing genotypes across all modules. Using a case study, we demonstrate the applicability of webGQT to query non-human genomes. In addition, we demonstrate the scalability of webGQT on large data sets by implementing complex population-specific queries on the 1000 Genomes Project Phase 3 data set, which includes 8.4 billion variants from 2504 individuals across 26 different populations. Furthermore, webGQT supports filtering single-nucleotide variants, short insertions/deletions, copy number or any other variant genotypes supported by the VCF specification. Our results show that webGQT can be used as an online web service, or deployed on personal computers or local servers within research groups. AVAILABILITY: webGQT is made available to the users in three forms: 1) as a webserver available at https://vm1138.kaj.pouta.csc.fi/webgqt/, 2) as an R package to install on personal computers, and 3) as part of the same R package to configure on the user's own servers. The application is available for installation at https://github.com/arumds/webgqt.

12.
Bioinformatics ; 35(22): 4782-4787, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31218349

RESUMO

SUMMARY: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies. AVAILABILITY AND IMPLEMENTATION: svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Humano , Software , Humanos , Deleção de Sequência , Sequenciamento Completo do Genoma
13.
Nat Genet ; 51(1): 88-95, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30531870

RESUMO

Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.


Assuntos
Genoma Humano/genética , Fases de Leitura Aberta/genética , Mapeamento Cromossômico/métodos , Deficiências do Desenvolvimento/genética , Humanos , Mutação/genética
14.
NPJ Genom Med ; 3: 22, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30109124

RESUMO

Early infantile epileptic encephalopathy (EIEE) is a devastating epilepsy syndrome with onset in the first months of life. Although mutations in more than 50 different genes are known to cause EIEE, current diagnostic yields with gene panel tests or whole-exome sequencing are below 60%. We applied whole-genome analysis (WGA) consisting of whole-genome sequencing and comprehensive variant discovery approaches to a cohort of 14 EIEE subjects for whom prior genetic tests had not yielded a diagnosis. We identified both de novo point and INDEL mutations and de novo structural rearrangements in known EIEE genes, as well as mutations in genes not previously associated with EIEE. The detection of a pathogenic or likely pathogenic mutation in all 14 subjects demonstrates the utility of WGA to reduce the time and costs of clinical diagnosis of EIEE. While exome sequencing may have detected 12 of the 14 causal mutations, 3 of the 12 patients received non-diagnostic exome panel tests prior to genome sequencing. Thus, given the continued decline of sequencing costs, our results support the use of WGA with comprehensive variant discovery as an efficient strategy for the clinical diagnosis of EIEE and other genetic conditions.

15.
Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29873782

RESUMO

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


Assuntos
Genômica/métodos , Software , Imunoprecipitação da Cromatina , Fator de Transcrição GATA1/metabolismo , Internet , Análise de Sequência de DNA , Interface Usuário-Computador
16.
Gigascience ; 7(7)2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29860504

RESUMO

SV-plaudit is a framework for rapidly curating structural variant (SV) predictions. For each SV, we generate an image that visualizes the coverage and alignment signals from a set of samples. Images are uploaded to our cloud framework where users assess the quality of each image using a client-side web application. Reports can then be generated as a tab-delimited file or annotated Variant Call Format (VCF) file. As a proof of principle, nine researchers collaborated for 1 hour to evaluate 1,350 SVs each. We anticipate that SV-plaudit will become a standard step in variant calling pipelines and the crowd-sourced curation of other biological results.Code available at https://github.com/jbelyeu/SV-plauditDemonstration video available at https://www.youtube.com/watch?v=ono8kHMKxDs.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Informática Médica/métodos , Alinhamento de Sequência , Análise de Sequência de DNA , Reações Falso-Positivas , Variação Genética , Genoma Humano , Humanos , Internet , Software
17.
Nat Genet ; 50(5): 727-736, 2018 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-29700473

RESUMO

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.


Assuntos
Transtorno do Espectro Autista/genética , Predisposição Genética para Doença/genética , Mutação INDEL/genética , Polimorfismo de Nucleotídeo Único/genética , Isoformas de Proteínas/genética , Feminino , Genoma/genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Masculino
18.
Nat Commun ; 9(1): 572, 2018 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-29402882

RESUMO

The originally published version of this Article contained an error in Figure 4. In panel a, grey boxes surrounding the subclones associated with patients #2 and #4 obscured adjacent portions of the heatmap. This error has now been corrected in both the PDF and HTML versions of the Article.

19.
Nat Methods ; 15(2): 123-126, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29309061

RESUMO

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.


Assuntos
Neoplasias da Mama/genética , Genoma Humano , Genômica/métodos , Ferramenta de Busca/métodos , Análise de Sequência de DNA/métodos , Software , Bases de Dados Genéticas , Feminino , Humanos , Internet
20.
Nat Commun ; 8(1): 1231, 2017 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-29093439

RESUMO

Metastatic breast cancer remains challenging to treat, and most patients ultimately progress on therapy. This acquired drug resistance is largely due to drug-refractory sub-populations (subclones) within heterogeneous tumors. Here, we track the genetic and phenotypic subclonal evolution of four breast cancers through years of treatment to better understand how breast cancers become drug-resistant. Recurrently appearing post-chemotherapy mutations are rare. However, bulk and single-cell RNA sequencing reveal acquisition of malignant phenotypes after treatment, including enhanced mesenchymal and growth factor signaling, which may promote drug resistance, and decreased antigen presentation and TNF-α signaling, which may enable immune system avoidance. Some of these phenotypes pre-exist in pre-treatment subclones that become dominant after chemotherapy, indicating selection for resistance phenotypes. Post-chemotherapy cancer cells are effectively treated with drugs targeting acquired phenotypes. These findings highlight cancer's ability to evolve phenotypically and suggest a phenotype-targeted treatment strategy that adapts to cancer as it evolves.


Assuntos
Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Evolução Clonal , Resistencia a Medicamentos Antineoplásicos/genética , Neoplasias da Mama/patologia , Células Cultivadas , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação , Fenótipo , Transdução de Sinais/genética , Análise de Célula Única/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA